Counterfactual Regret Minimization for Decentralized Planning
نویسندگان
چکیده
Regret minimization is an effective technique for almost surely producing Nash equilibrium policies in coordination games in the strategic form. Decentralized POMDPs offer a realistic model for sequential coordination problems, but they yield doubly exponential sized games in the strategic form. Recently, counterfactual regret has offered a way to decompose total regret along a (extensive form) game tree into components that can be individually controlled, such that minimizing all of them minimizes the total regret as well. However, a straightforward extension of this decomposition in decentralized POMDPs leads to a complexity exponential in both the joint action and joint observation spaces. We present a more tractable approach to regret minimization where the regret is decomposed along the nodes of agents’ policy trees that yields a complexity exponential only in the joint observation space. We present an algorithm, REMIT, to minimize regret by this decomposition and prove that it converges to a Nash equilibrium policy in the limit. We also use a stronger convergence criterion with REMIT, such that if this criterion is met then the algorithm must output a Nash equilibrium policy in finite time. We found empirically that in every benchmark problems that we tested, this criterion was indeed met and (near) optimal Nash equilibrium policies were achieved.
منابع مشابه
Regret Minimization in Games with Incomplete Information
Extensive games are a powerful model of multiagent decision-making scenarioswith incomplete information. Finding a Nash equilibrium for very large instancesof these games has received a great deal of recent attention. In this paper, wedescribe a new technique for solving large games based on regret minimization.In particular, we introduce the notion of counterfactual regret, whi...
متن کاملInference-based Decision Making in Games
Background: Reinforcement learning in complex games has traditionally been the domain of valueor policy iteration algorithms, resulting from their effectiveness in planning in Markov decision processes, before algorithms based on regret minimization guarantees such as upper confidence bounds applied to trees (UCT) and counterfactual regret minimization were developed and proved to be very succe...
متن کاملMulti-Agent Planning with Baseline Regret Minimization
We propose a novel baseline regret minimization algorithm for multi-agent planning problems modeled as finite-horizon decentralized POMDPs. It guarantees to produce a policy that is provably at least as good as a given baseline policy. We also propose an iterative belief generation algorithm to efficiently minimize the baseline regret, which only requires necessary iterations so as to converge ...
متن کاملMonte Carlo Sampling for Regret Minimization in Extensive Games
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome samplin...
متن کاملSlumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing
Slumbot NL is a heads-up no-limit hold’em poker bot built with a distributed disk-based implementation of counterfactual regret minimization (CFR). Our implementation enables us to solve a large abstraction on commodity hardware in a cost-effective fashion. A variant of the Public Chance Sampling (PCS) version of CFR is employed which works particularly well with
متن کامل